Improving colposcopic accuracy for cervical precancer detection: a retrospective multicenter study in China

Background Colposcopy alone can result in misidentification of high-grade squamous intraepithelial or worse lesions (HSIL +), especially for women with Type 3 transformation zone (TZ) lesions, where colposcopic assessment is particularly imprecise. This study aimed to improve HSIL + case identification by supplementing referral screening results to colposcopic findings. Methods This is an observational multicenter study of 2,417 women, referred to colposcopy after receiving cervical cancer screening results. Logistic regression analysis was conducted under uni- and multivariate models to identify factors which could be used to improve HSIL + case identification. Histological diagnosis was established as the gold standard and is used to assess accuracy, sensitivity, and specificity, as well as to incrementally improve colposcopy. Results Multivariate analysis highlighted age, TZ types, referral screening, and colposcopists’ skills as independent factors. Across this sample population, diagnostic accuracies for detecting HSIL + increased from 72.9% (95%CI 71.1–74.7%) for colposcopy alone to 82.1% (95%CI 80.6–83.6%) after supplementing colposcopy with screening results. A significant increase in colposcopic accuracy was observed across all subgroups. Although, the highest increase was observed in women with a TZ3 lesion, and for those diagnosed by junior colposcopists. Conclusion It appears possible to supplement colposcopic examinations with screening results to improve HSIL + detection, especially for women with TZ3 lesions. It may also be possible to improve junior colposcopists’ diagnoses although, further psychological research is necessary. We need to understand how levels of uncertainty influence diagnostic decisions and what the concept of “experience” actually is and what it means for colposcopic practice.


Introduction
Effective cervical cancer screening programs can detect precancers before they progress to invasive cancers [1,2]; however, there are points in the identification process which are subjective and therefore susceptible to human error. Generally, women with abnormal screening results are referred to colposcopy, and then to biopsy for histologic diagnosis. Colposcopy with biopsy has an important role in determining treatments and further observations. If high-grade squamous intraepithelial or worse lesions i.e., HSIL + is confirmed, treatment is required in most cases [3,4]. However, colposcopic examination can be inaccurate, since up to 40% of all HSIL + cases are missed in low-and middle-income countries (LMICs) [5,6]. Of course, colposcopic accuracy can be affected by a number of factors, colposcopists' skills, screening results i.e., cytology and human papillomavirus [HPV] testing, type of transformation zone (TZ), number of biopsies, etc. [7].
We know that women with lesions in TZ3 are difficult to identify. This is because TZ3 lesions cannot easily be seen with canal-based cytology. Therefore, the number of false negatives related to TZ3 lesions is higher than other types. Poor diagnostic performance is again more evident in LMICs where there is a shortage of skills and experienced colposcopists [8]. A high proportion of women in these countries, between 20-80%, also have TZ3 lesions [9,10]. Therefore, it is vital we improve colposcopic practice through research and training. Some investigators have reported that adding referral screening results to colposcopic examinations can improve HSIL + detection [11,12]. Although, colposcopy practitioners do not consider referral screening results with colposcopy, as they might.
A recent study suggested that the use of biomarkers and HPV genotyping could improve diagnostic accuracy for women with TZ3 lesions [13]. However, these findings are based on rather limited data. The study had a small sample size, with approximately only 100 women, and therefore these findings are difficult to generalize, despite being promising. Another issue we face is that there is no comparative data from LMICs which do face a greater burden related to cervical cancer. Therefore, we have a duty to investigate methods for improving HSIL + diagnostics in LMICs such as China, to improve case identification. By supplementing colposcopic diagnosis with referral screening results it is hoped that we can learn to improve HSIL + detection around the world.

Study population
Data were retrospectively collected from digital records of women who underwent colposcopic examination from January 2018 to October 2021. All patients had attended one of three colposcopy clinics within different hospitals (municipal and provincial) in mainland China. Women who had previous treatments such as total hysterectomy or history of pelvic radiation, and those who underwent colposcopy but had no histologic report were excluded.
Demographics and clinical data were collected, including age, cytological results, HPV status, TZ types, colposcopic diagnosis, colposcopists' skills and histological results. This study was conducted in accordance with the Declaration of Helsinki and received ethical approval from the institutional review board (IRB) of Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS/PUMC). The need for informed consent was waived since the study was retrospective and personal information was anonymized.

Cytology and HPV testing
Cytology results were classified into five categories according to the Bethesda System [14] including, no intraepithelial lesion cells and malignant cells (NILM), atypical squamous cells of unknown significance (ASC-US), low-grade squamous intraepithelial lesion (LSIL), atypical squamous cells of cannot exclude high-grade squamous intraepithelial lesion (ASC-H), high-grade squamous intraepithelial lesion and/or squamous cervical carcinoma (HSIL +). The type of HPV test used was not recorded in this instance because it was not considered pertinent to this study. High-risk HPV (hr-HPV) types were defined as either HPV16/18, non-16/18 hrHPV or hrHPV negative.

Colposcopy and histology diagnosis
Referral to colposcopic examination was based on the 2011 International Federation of Cervical Pathology and Colposcopy (IFCPC) [15]. All colposcopies were performed by different colposcopists (junior and senior), which were divided into two groups: those with more than 5-year experience, and those with less than 5-year experience using digital colposcopes.
General assessment was conducted to check cervical visualizations, which included visible TZ1/2 and not visible TZ3. All colposcopically detected abnormalities were directly biopsied. If necessary, an endocervical curettage was performed after cervical biopsies. The colposcopic diagnosis was described as normal/benign findings, lowgrade, high-grade.
Histological diagnosis was performed by experienced histologists from local hospitals with disagreements considered in consultation with histologists' consultations. Results were classified as normal, low grade squamous intraepithelial lesion (LSIL), and high-grade squamous intraepithelial lesion or worse (HSIL +) according to a revised version standard of World Health Organization (WHO) classifications [16]. The histologic results were taken as the gold standard. When analyzing biopsies, excision specimens and/or endocervical curettage together, the final histological diagnosis was considered the worst grade of dysplasia present.

Statistical analysis
Data analysis was performed using SPSS (version 24.0) and R (version 4.0.5). Accuracy, sensitivity, and specificity were used to assess diagnostic performance for HSIL + . Findings were compared using a standard Chi-square test.
Univariate and multivariate logistic regression analyses with an enter approach were performed to assess independent factors which are reported as odds ratios (OR) with 95% CI for detecting HSIL + . All p values presented are two-sided and differences were considered statistically significant with p values lower than 0.05.

Evaluation of the diagnostic performance of colposcopy for detecting HSIL +
The overall colposcopic performance and its subgroup performance in different TZ types and colposcopist's skills for detecting HSIL + are showed in Table 2. The overall diagnostic accuracy, sensitivity and specificity were 72.9%, 70.2%, and 75.1%, respectively. For women with different TZ types detecting HSIL + , the accuracy (75.4% versus 70.9%, p = 0.014) and sensitivity (77.9% versus 62.2%, p < 0.001) of TZ1/2 were significantly higher than those with TZ3, while there was no statistical difference for specificity (72.7% versus 76.6%, p = 0.109). For women diagnosed by different colposcopists' skills detecting HSIL + , the accuracy (69.6% versus 78.6%, p < 0.001) and sensitivity (62.0% versus 79.8%, p < 0.001) of senior colposcopists were significantly higher than those with the junior colposcopists. As for specificity (77.1% versus 74.3%, p = 0.290), the senior colposcopists was slightly higher than junior colposcopists, but no statistical difference between different colposcopists' skills. Table 3 presents univariate and multivariate logistic regression analysis. We showed that the age group, cytology and, HPV status, TZ types, and colopscopists' skills appear as clinical influences over colposcopic accuracy of detecting HSIL + . In multivariate logistic regression, women whose HPV status were HPV16/18 (OR, 4.45), and cytological results were LSIL (OR, 2.42), ASC-H (OR, 3.41) and HSIL + (OR, 28.26) appears to positively correlate with higher odds of accurate detecting histological HSIL + . These findings indicate that a combination of the referral screening test results, as this could improve the colposcopic accuracy.

Clinical factors analysis affecting the colposcopic accuracy of detecting HSIL +
Diagnostic performance of colposcopy combined with referral screening results for detecting HSIL + Table 4 reports the diagnostic performance of adding the referral screening test results to colposcopy for detecting HSIL + . Fig. 1 shows the incremental accuracy and    diagnosed by junior colposcopists, the corresponding yields for HSIL + increased from 69.6% to 80.0%, and from 62.0% to 92.2%, respectively. we found that the highest accuracy (12.1%) and sensitivity (38.7%) of in increment yields for HSIL + were observed in the subgroup of women with a TZ3, and diagnosed by junior colposcopists.

Discussion
The aim of this study was to supplement colposcopic diagnosis with referral screening results with a view to improving HSIL + detection in China, and around the world. Data from 2,417 women who attended colposcopy clinics across China were collated which enabled us to assess colposcopic performance. Evidence was then stratified according to TZ types and around colposcopists' skills and experience. The primary goal was determined whether supplementing colposcopic diagnosis with referral screening results can add clinical value. Overall, colposcopy alone had a relatively low level of accuracy i.e., 72.9% for detecting HSIL + however, this was not surprising. Colposcopic agreement with biopsies varies substantially between and within countries but also between specialist clinics. This is because a colposcope is operated by human clinicians over a period of 20-40 min, which increases the risk of error. The sensitivity of colposcopy alone for detecting HSIL + in this study was 70.2%, with a specificity of 75.1%. These appear comparable to previous studies conducted in China. For example, in a previous study we found that colposcopic accuracy for detecting HSIL + was 69.7% [17]. Likewise, Li et al. reported 64.95% accuracy for colposcopically directed biopsy when identifying HSIL + cases [18]. Another study by Fan et al. [19] found agreement between colposcopic diagnosis and histology of 65.5% in China. Taken together this evidence suggests that colposcopy is a useful tool to detect HSIL + but it is imprecise. As has been mentioned, colposcopic accuracy is biased on colposcopists's level subjective experience, TZ and etc. Each of these cause inaccuracies which have prompted studies to improve colposcopic HSIL + detection. It is important to be aware that these inaccuracies have real-world consequences which should be avoided, if at all possible. Many may simply defer to biopsy but we must also consider the psychological implications for patients and the economics which all health systems must factor into treatment pathways.
One clear issue that we must address is how colposcopists' skills can affect colposcopic accuracy for detecting HSIL + . Therefore, in this study we performed subgroup analysis of senior and junior colposcopists. Unsurprisingly, we found colposcopic accuracy was higher for senior colposcopists with more than five years of experience. This is consistent with other findings from around the world. For example, Baum et al. reported related evidence that junior colposcopists have a tendency to overestimate [20] however, some have found opposing evidence. For instance, Stuebs et al. [11] found no significant difference between colposcopists' according around experience. Although, their evidence may be biased because of patient selection. Patients with more complex symptoms are also more likely to be attended to by senior colposcopits. So, it is fair to say that experience is more likely to benefit patients but can also create more assumptions or lapses which lead to misdiagnoses.
Transformation zones are the most important anatomical entities where CIN and invasive carcinoma arise. However, as we know colposcopy is not always adequately performed. Among other reasons, this is because transformation zones are often obscured from view but inflammation, atrophy, bleeding etc. This is especially true for TZ3 lesions which can be 'tucked-inside' the cervix and therefore are less visible. We found that 56.1% of all cases had TZ3 lesions, which likely meant that colposcopies were unsatisfactorily completed. This seems high but again this varies substantially. For example, Zhang et al. [10]. studied 1,838 cases and found 18.2% were TZ 1 or 2, and the remaining 81.8% were TZ3 cases. While Fan et al. [19] found 83.6% were TZ1 or 2, and 16.4% were TZ3 of the 1,892 cases studied. In a German study researchers found that TZ3 lesions accounted for 81% of the 3,761 cases studied [21]. Again, these variations may be explained by the distribution of lesions within the cervix. We found significant heterogeneity in the distribution of different TZs in different studies although, this would need to be systematically reviewed. It may be possible to develop a type of transpositional image-based meta-analysis to study transformation zones although it is beyond this study to discuss this further.
Evidence from this study suggests there is a higher accuracy (75.4% versus 70.9%) and sensitivity for HSIL + (77.9% versus 62.2%) in women with TZ1/2 compared to women with TZ3. However, there was no significant difference in TZ types for specificity (72.7% versus 76.6%). Stuebs et al. [11]. reported similar results with higher accuracies for TZ1/2 and lower accuracies for TZ3. In another study, it has also been found that the accuracy of TZ1 or 2 is higher than for TZ3 [22]. These studies suggest that transformation zones around of critical importance, and affect colposcopic accuracy, especially for TZ3. This is also supported by meta-analytical results by Ren et al. [7]. However, the key issue is how aware a colposcopist is of the unsatisfactory nature of the examination and of course, what she or he decides to do next for the patient. Further research is needed into the decision-making process and psychological factors which drive "next-step" processes in the diagnostic pathway. We need insight into levels of uncertainty which initiate colposcopists to order biopsies or to schedule followup examinations. Knowing this would take us beyond the notion that 'experience' as the important factor, and ground us into developing more sophisticated colposcopy training.
We performed subgroup analysis according to TZ types and colposcopists' levels of experience. In the junior colposcopist subgroup, we found similar results, with a higher sensitivity of 71.7% for TZ1/2, compared to TZ3 which was 52.5%. For senior colposcopists, we found that the difference of colposcopic sensitivity between women with TZ1/2 and TZ3 was not so largely. One explanation for this is that, colposcopists with more than five years of clinical experience are more likely to accurately identify TZ3 HSIL + . Although, it may also mean that a more senior colposcopist is more likely to seeking confirmation and rather use biopsy to disprove a diagnostic assertion. It is therefore, important that the 2011 IFCPC guidelines have incorporated classification transformation zone classification as an obligatory terminology [15]. However again, there are discrepancies due to the lack of prospective and randomized clinical trials which record and report this data. Further research is needed to understand the dynamics involved and to standardize guidelines because at present transformation zone terminology is not considered in the ASCCP guidelines [23,24].
We also conducted univariate and multivariate analysis to explore factors which influence colposcopic accuracy at detecting HSIL + . We found that the age, cytology, HPV test results, TZ type and the colposcopists' skills are all independent factors. Among influential factors, cytology and HPV testing appear to be significantly related to HSIL + detection with relatively high odds ratios. These findings are also consistent with previous clinical studies conducted in China [12,17,25]. This is understandable because women referred to colposcopy with high-risk screening results (e.g., HPV16/18 + and high-grade cytology) are more likely to be HSIL + cases. By contrast, women with low-risk screening results (e.g., HPV16/18and ASC-US cytology) were at low-risk of HSIL + . Therefore, European and American relevant guidelines highlight the value of adding screening results to colposcopy impression for improved HSIL + case identification [24,26].
In order to develop this body of evidence, we supplemented colposcopic diagnosis with referral screening results. By adding this we managed to increase accuracy by 9.2%, and sensitivity by 22.6%, which is considerable and despite slightly lowering specificity by 1.6%. We also observed similar results in different TZ types and according to colposcopists' skills. Although, the greatest sensitivity increase was observed for previously poor performance in the TZ3 subgroup and in junior colposcopists. Overall, our results extend the evidence from clinical studies which suggest that combining colposcopic findings with referral screening results can markedly improve HSIL + detection.
However, it is important consider what else could be done to further improve colposcopic accuracy and ensure the WHO's goal of eliminating cervical cancer worldwide by 2030 can be achieved. Of course, HPV vaccinations will surely help as will the move from operator-dependent cytology to less operator-dependent HPV detection. Although, the practice of colposcopy faces a number of unpredictable challenges [27]. Therefore, guidelines should be regularly updated (according to best evidence) to meet the needs of real-world clinical practice. A number of possible factors (e.g., pre-colposcopy assessment, extended HPV genotyping, biomarkers, dual staining and methylation, etc.) could potentially improve diagnostic performance although, these are not yet available as point-of-care testing. Recently, the Polish Society of Colposcopy and Cervical Pathophysiology has suggested to include a series of pre-colposcopy assessments, detection techniques and HPV vaccination status, to optimize colposcopy practice toward tailored management [28]. Our findings indicate the benefits of adding screening results to colposcopic diagnosis to better identify HSIL cases. However, more data should be provided to support optimal colposcopy strategy development, based on supplementary measures which improve the diagnostic accuracy of colposcopy. Unfortunately, as well as trying to manage diagnostic uncertainty and our patients' wellbeing, we are also increasingly being encouraged to consider the financial implications of prognostics. Moreover, there is a shortage of experienced colposcopists, especially in LMICs. Perhaps, artificial intelligence-guided colposcopy can improve this form of diagnostics [8,29] although we need to test and validate new models. As always, these advances will have implications for medical education which should not be left as an afterthought.
To the best of our knowledge, this is the first study to consider the diagnostic value for identifying HSIL + in women with different TZs and colposcopist' skills by integrating colposcopic impressions with referral screening results across a Chinese multicenter study. However, there were several limitations that should be mentioned. Firstly, this was a retrospective study design which comes with a number of well-reported issues. Second, we previously found that taking multiple colposcopy-directed biopsies can increase HSIL detection [30][31][32], therefore we did not extract all pertinent data for analysis. Third, we tried to investigate the influence age, although this was not included because we were unable to extract sufficient data to understand how age influences HSIL + detection. We are planning further research to understand factors involved in women's development which impact on case detection. Finally, we have only studied colposcopic accuracy for detecting HSIL + . The data required to discern differences between HSIL + , cervical intraepithelial neoplasia grade 2 or worse, and CIN3 + are still needed.

Conclusion
Our findings reaffirm that colposcopic HSIL + detection is imprecise, especially for women with TZ3 lesions and those diagnosed by junior colposcopists. It is also possible to improve upon this by supplementing colposcopic findings with referral screening results. Perhaps, we could develop a type of transpositional image-based meta-analysis to study TZs although this is interdisciplinary research which requires more thought. We also need to conduct more psychological research to gain insight into colposcopists' levels of uncertainty and how they acknowledge and respond to this sensation. At present, the concept of "experience" is based solely upon a time threshold, we have no understanding of what "experience" actually means for colposcopists.